QVAC-20984 feat: add analytic gradchecked backward pass for the CAMPPlus speaker encoder#61
Merged
Merged
Conversation
…oder Make CAMPPlus differentiable for the voice-clone enrollment loop: an analytic C++ backward returning d(loss)/d(fbank) with frozen weights (target-WAV embedding stays forward-only). Mirrors campplus_embed_cpu in channel-major layout. Covers FCM (Conv2d + residual blocks), TDNN, CAMDenseTDNN blocks (context-attention gate + dense concat), stats pooling and the dense head. Tests (always-on unit tier, model-free): - test-campplus-backward: gradcheck every primitive + full chain vs central finite differences (Task 2 harness). - test-campplus-backward-parity: analytic double forward vs production campplus_embed_cpu on synthetic weights. QVAC-20984
Review StatusCurrent Status: ❌ PENDING Pending reviews: Needs 1 Management or Team Lead, and 1 more from Management, Team Lead, or Member. |
Address PR #61 review notes (non-blocking): - Parity test now builds CAM blocks with num_layers 2/3/2 (was 1/1/1) so the dense-concat accumulation (layer i enters with C_in + i*growth) is anchored to the production forward, not only to the self-referential full-chain gradcheck. Parity stays green (max_abs ~4.6e-08, max_rel ~8.9e-08). - Document the trust chain in the parity test header and the gap-matrix doc: every campplus_embed caller in the repo (main.cpp, test-campplus, test-voice-embedding) uses the scalar CPU forward, which is validated against the Python reference; campplus_embed_ggml is not wired to any caller yet.
GustavoA1604
approved these changes
Jun 22, 2026
Zbig9000
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes the CAMPPlus speaker encoder differentiable for voice-clone enrollment by
adding an analytic, model-free C++ backward pass that returns
d(loss)/d(fbank),validated against the Task 2 finite-difference gradcheck harness. In the
enrollment loop CAMPPlus provides the speaker-similarity loss; the target-WAV
embedding stays forward-only (constant) and only the generated-audio path needs
gradients, so the gradient is the input gradient with the model weights frozen.
Follows the same pattern as the sibling tickets already on
master(#55 text-encoder tail / QVAC-20978, #58 vector estimator / QVAC-20982,
#60 vocoder / QVAC-20983): a pure
doublereference backward, gradcheckedcomponent-wise, with the op×backend gap documented. Dependencies: Task 2
(QVAC-20979).
Forward-parity anchor
A gradcheck alone is self-referential: it only proves the backward is the exact
derivative of its own forward. To tie that to the real model, a second test
asserts the analytic
doubleforward matches the production scalar forward(
campplus_embed_cpu) on synthetic weights (max_abs ≈ 3e-8, i.e. float-vs-doublerounding only). Building it surfaced that
campplus_embed_cpu'sfcm_forwardhardcodes the input feature dim to 80, so the production CPU path is only
self-consistent at
feat_dim=80(the parity test uses that). The analyticbackward derives every dimension from
feat_dim, so it is geometry-agnostic.Changes
src/campplus_backward.{h,cpp}— newCampplusBackwardclass (namespacecp_grad). Owns the frozen weights and caches per-call activations as state;public surface is
forward(fbank)/backward(d_emb). Channel-major(C, T)layout mirroring
campplus_embed_cpuexactly. Implements the CAMPPlusprimitives and their input-gradients: stride/pad/dilation-aware conv1d/conv2d,
pre-fused affine batch norm, ReLU, sigmoid, time-mean, segment pooling,
statistics pooling (mean + unbiased std), the FCM Conv2d residual block (with
optional shortcut) and the CAMDenseTDNN layer (context-attention gate + dense
concat split).
test/test_campplus_backward.cpp— gradchecks every primitive, the FCMresidual block, the CAM dense-TDNN layer and the full chain (12 checks) against
central finite differences via the Task 2 harness. Always-on
unitctest tier(no model/fixtures, no-skip policy).
test/test_campplus_backward_parity.cpp— forward parity vs the productioncampplus_embed_cpu(see above). Alsounittier.docs/voiceclone-backward-campplus.md— op×backend gap matrix andCPU-fallback rationale for enrollment.
CMakeLists.txt— register thetest-campplus-backwardandtest-campplus-backward-paritytargets.CPU fallback (documented)
SIGMOID,SQRT,MEAN,SUM_ROWS,PAD,REPEATandCONCAThave nobackward in the vendored
ggml, so the enrollment backward cannot use ggmlautodiff on any backend. It is provided as the analytic C++ backward and runs on
CPU (enrollment is offline; the realtime synthesis GPU fast paths are untouched).
See the doc for the full matrix.
Acceptance
Gradcheck green:
test-campplus-backwardandtest-campplus-backward-parityboth pass (2/2 in the
unittier).